NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

dbCAN3: automated carbohydrate-active enzyme and substrate annotation

https://doi.org/10.1093/nar/gkad328

Zheng, Jinfang; Ge, Qiwei; Yan, Yuchen; Zhang, Xinpeng; Huang, Le; Yin, Yanbin (May 2023, Nucleic Acids Research)

Abstract Carbohydrate active enzymes (CAZymes) are made by various organisms for complex carbohydrate metabolism. Genome mining of CAZymes has become a routine data analysis in (meta-)genome projects, owing to the importance of CAZymes in bioenergy, microbiome, nutrition, agriculture, and global carbon recycling. In 2012, dbCAN was provided as an online web server for automated CAZyme annotation. dbCAN2 (https://bcb.unl.edu/dbCAN2) was further developed in 2018 as a meta server to combine multiple tools for improved CAZyme annotation. dbCAN2 also included CGC-Finder, a tool for identifying CAZyme gene clusters (CGCs) in (meta-)genomes. We have updated the meta server to dbCAN3 with the following new functions and components: (i) dbCAN-sub as a profile Hidden Markov Model database (HMMdb) for substrate prediction at the CAZyme subfamily level; (ii) searching against experimentally characterized polysaccharide utilization loci (PULs) with known glycan substates of the dbCAN-PUL database for substrate prediction at the CGC level; (iii) a majority voting method to consider all CAZymes with substrate predicted from dbCAN-sub for substrate prediction at the CGC level; (iv) improved data browsing and visualization of substrate prediction results on the website. In summary, dbCAN3 not only inherits all the functions of dbCAN2, but also integrates three new methods for glycan substrate prediction.
more » « less
dbCAN-seq update: CAZyme gene clusters and substrates in microbiomes

https://doi.org/10.1093/nar/gkac1068

Zheng, Jinfang; Hu, Boyang; Zhang, Xinpeng; Ge, Qiwei; Yan, Yuchen; Akresi, Jerry; Piyush, Ved; Huang, Le; Yin, Yanbin (November 2022, Nucleic Acids Research)

Abstract Carbohydrate Active EnZymes (CAZymes) are significantly important for microbial communities to thrive in carbohydrate rich environments such as animal guts, agricultural soils, forest floors, and ocean sediments. Since 2017, microbiome sequencing and assembly have produced numerous metagenome assembled genomes (MAGs). We have updated our dbCAN-seq database (https://bcb.unl.edu/dbCAN_seq) to include the following new data and features: (i) ∼498 000 CAZymes and ∼169 000 CAZyme gene clusters (CGCs) from 9421 MAGs of four ecological (human gut, human oral, cow rumen, and marine) environments; (ii) Glycan substrates for 41 447 (24.54%) CGCs inferred by two novel approaches (dbCAN-PUL homology search and eCAMI subfamily majority voting) (the two approaches agreed on 4183 CGCs for substrate assignments); (iii) A redesigned CGC page to include the graphical display of CGC gene compositions, the alignment of query CGC and subject PUL (polysaccharide utilization loci) of dbCAN-PUL, and the eCAMI subfamily table to support the predicted substrates; (iv) A statistics page to organize all the data for easy CGC access according to substrates and taxonomic phyla; and (v) A batch download page. In summary, this updated dbCAN-seq database highlights glycan substrates predicted for CGCs from microbiomes. Future work will implement the substrate prediction function in our dbCAN2 web server.
more » « less
The chromosome-level rambutan genome reveals a significant role of segmental duplication in the expansion of resistance genes

https://doi.org/10.1093/hr/uhac014

Zheng, Jinfang; Meinhardt, Lyndel W; Goenaga, Ricardo; Matsumoto, Tracie; Zhang, Dapeng; Yin, Yanbin (January 2022, Horticulture Research)

Full Text Available
The chromosome-level genome of dragon fruit reveals whole-genome duplication and chromosomal co-localization of betacyanin biosynthetic genes

https://doi.org/10.1038/s41438-021-00501-6

Zheng, Jinfang; Meinhardt, Lyndel W.; Goenaga, Ricardo; Zhang, Dapeng; Yin, Yanbin (March 2021, Horticulture Research)

Abstract Dragon fruits are tropical fruits economically important for agricultural industries. As members of the family ofCactaceae, they have evolved to adapt to the arid environment. Here we report the draft genome ofHylocereus undatus, commercially known as the white-fleshed dragon fruit. The chromosomal level genome assembly contains 11 longest scaffolds corresponding to the 11 chromosomes ofH. undatus. Genome annotation ofH. undatusfound ~29,000 protein-coding genes, similar toCarnegiea gigantea(saguaro). Whole-genome duplication (WGD) analysis revealed a WGD event in the last common ancestor ofCactaceaefollowed by extensive genome rearrangements. The divergence time betweenH. undatusandC. giganteawas estimated to be 9.18 MYA. Functional enrichment analysis of orthologous gene clusters (OGCs) in sixCactaceaeplants found significantly enriched OGCs in drought resistance. Fruit flavor-related functions were overrepresented in OGCs that are significantly expanded inH. undatus. TheH. undatusdraft genome also enabled the discovery of carbohydrate and plant cell wall-related functional enrichment in dragon fruits treated with trypsin for a longer storage time. Lastly, genes of the betacyanin (a red-violet pigment and antioxidant with a very high concentration in dragon fruits) biosynthetic pathway were found to be co-localized on a 12 Mb region of one chromosome. The consequence may be a higher efficiency of betacyanin biosynthesis, which will need experimental validation in the future. TheH. undatusdraft genome will be a great resource to study various cactus plants.
more » « less
Cadmium stress triggers significant metabolic reprogramming in Enterococcus faecium CX 2–6

https://doi.org/10.1016/j.csbj.2021.10.021

Cheng, Xin; Yang, Bowen; Zheng, Jinfang; Wei, Hongyu; Feng, Xuehuan; Yin, Yanbin (January 2021, Computational and Structural Biotechnology Journal)

Full Text Available
dbCAN-PUL: a database of experimentally characterized CAZyme gene clusters and their substrates

https://doi.org/10.1093/nar/gkaa742

Ausland, Catherine; Zheng, Jinfang; Yi, Haidong; Yang, Bowen; Li, Tang; Feng, Xuehuan; Zheng, Bo; Yin, Yanbin (September 2020, Nucleic Acids Research)
null (Ed.)
Abstract PULs (polysaccharide utilization loci) are discrete gene clusters of CAZymes (Carbohydrate Active EnZymes) and other genes that work together to digest and utilize carbohydrate substrates. While PULs have been extensively characterized in Bacteroidetes, there exist PULs from other bacterial phyla, as well as archaea and metagenomes, that remain to be catalogued in a database for efficient retrieval. We have developed an online database dbCAN-PUL (http://bcb.unl.edu/dbCAN_PUL/) to display experimentally verified CAZyme-containing PULs from literature with pertinent metadata, sequences, and annotation. Compared to other online CAZyme and PUL resources, dbCAN-PUL has the following new features: (i) Batch download of PUL data by target substrate, species/genome, genus, or experimental characterization method; (ii) Annotation for each PUL that displays associated metadata such as substrate(s), experimental characterization method(s) and protein sequence information, (iii) Links to external annotation pages for CAZymes (CAZy), transporters (UniProt) and other genes, (iv) Display of homologous gene clusters in GenBank sequences via integrated MultiGeneBlast tool and (v) An integrated BLASTX service available for users to query their sequences against PUL proteins in dbCAN-PUL. With these features, dbCAN-PUL will be an important repository for CAZyme and PUL research, complementing our other web servers and databases (dbCAN2, dbCAN-seq).
more » « less
Full Text Available
eCAMI: simultaneous classification and motif identification for enzyme annotation

https://doi.org/10.1093/bioinformatics/btz908

Xu, Jing; Zhang, Han; Zheng, Jinfang; Dovoedo, Philippe; Yin, Yanbin; Xu, ed., Jinbo (December 2019, Bioinformatics)

Abstract MotivationCarbohydrate-active enzymes (CAZymes) are extremely important to bioenergy, human gut microbiome, and plant pathogen researches and industries. Here we developed a new amino acid k-mer-based CAZyme classification, motif identification and genome annotation tool using a bipartite network algorithm. Using this tool, we classified 390 CAZyme families into thousands of subfamilies each with distinguishing k-mer peptides. These k-mers represented the characteristic motifs (in the form of a collection of conserved short peptides) of each subfamily, and thus were further used to annotate new genomes for CAZymes. This idea was also generalized to extract characteristic k-mer peptides for all the Swiss-Prot enzymes classified by the EC (enzyme commission) numbers and applied to enzyme EC prediction. ResultsThis new tool was implemented as a Python package named eCAMI. Benchmark analysis of eCAMI against the state-of-the-art tools on CAZyme and enzyme EC datasets found that: (i) eCAMI has the best performance in terms of accuracy and memory use for CAZyme and enzyme EC classification and annotation; (ii) the k-mer-based tools (including PPR-Hotpep, CUPP and eCAMI) perform better than homology-based tools and deep-learning tools in enzyme EC prediction. Lastly, we confirmed that the k-mer-based tools have the unique ability to identify the characteristic k-mer peptides in the predicted enzymes. Availability and implementationhttps://github.com/yinlabniu/eCAMI and https://github.com/zhanglabNKU/eCAMI. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less

Search for: All records